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A  decision-theoretic  framework  is  proposed  for  evaluating  the  efficiency  of  simulation  estimators. 
The  framework  includes  the  cost  of  obtaining  the  estimate  as  well  as  the  cost  of  acting  based  on  the 
estimate.  The  cost  of  obtaining  the  estimate  and  the  estimate  itself  are  represented  as  realizations  of 
jointly  distributed  stochastic  processes.  In  this  context,  the  efficiency  of  a  simulation  estimator  based 
on  a  given  computational  budget  is  defined  as  the  reciprocal  of  the  risk  (the  overall  expected  cost). 
This  framework  is  appealing  philosophically,  but  it  is  difficult  to  apply  in  practice  (e.g.,  to  compare  the 
efficiency  of  two  different  estimators)  because  only  rarely  can  the  efficiency  associated  with  a  given 
computational  budget  be  calculated.  However,  a  useful  practical  framework  emerges  in  a  large  sample 
context  when  we  consider  the  limiting  behavior  as  the  computational  budget  increases.  A  limit  theorem 
established  for  this  model  supports  and  extei^  a  fairly  well  known  efficiency  principle,  proposed  by 
Hammersley  and  Handscomb  (1964),  p.  22:  “^The  efficiency  of  a  Monte  Carlo  process  may  be  taken  as 
inversely  proportional  to  the  product  of  the  sampling  variance  and  the  amount  of  labour  expended  in 
obtaining  this  estimate.”'  ^  _ 


Key  words:  simulation,  efficiency:  definitions  and  asymptotic  theory,  simulation,  statistical  analysis: 
asymptotic  efficiency;  statistics,  estimation:  efficiency  of  simulation  estimators. 
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In  this  paper  we  develop  a  framework  for  evaluating  the  efficiency  of  alternative 
simulation  estimators.  Our  goal  is  to  effectively  capture  the  interplay  between  the 
variability  of  an  estimator  and  the  computational  effort  required  to  calculate  it.  We  begin 
by  developing  what  we  regard  as  a  philosophically  appealing  decision-theoretic  model  of 
estimation  with  a  budget  constraint.  Unfortunately,  this  model  does  not  seem  so  useful  in 
practice,  because  the  efficiency  of  an  estimator  is  usually  difficult  to  calculate.  However, 
we  obtain  a  more  useful  framework  in  a  large  sample  context  by  considering  the  limiting 
behavior  as  the  computational  budget  increases.  Thus,  our  primary  focus  is  on  asymptotic 
efficiency. 

We  believe  that  our  asymptotic  efficiency  framework  provides  an  effective  means  for 
comparing  EITs  (efficiency  improvement  techniques).  Our  analysis  also  supports 
replacing  the  classical  notion  of  VRT  (variance  reduction  technique)  by  EIT.  An  example 
in  which  the  efficiency  may  be  improved  with  higher  variance  occurs  when  we  estimate  the 
mean  sojourn  time  in  a  queueing  system:  The  sample  variance  is  usually  less  if  we  use  a 
direct  sample  mean  than  if  we  use  an  indirect  estimator  based  on  the  number  in  system 
and  L  -  XW;  see  Glynn  and  Whitt  (1989).  However,  it  nevertheless  may  be  more 
efficient  to  use  the  indirect  estimator;  see  Nozari  and  Whitt  (1988),  p.  313.  Another 
example  in  which  an  EIT  is  associated  with  higher  sample  variance  occurs  when  estimating 
expected  discounted  costs;  see  Fox  and  Glynn  (1989a). 

In  many  simulation  settings,  our  asymptotic  efficiency  framework  provides  theoretical 

support  for  an  efficiency  principle  proposed  without  much  discussion  by  Hammersley  and 

Handscomb  (1964),  pp.  22,  SI: 

The  efficiency  of  a  Monte  Carlo  process  may  be  taken 
as  inversely  proportional  to  the  product  of  the  sampling 
variance  and  the  amount  of  labour  expended  in  obtaining 
this  estimate. 


(1) 
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This  efficiency  principle  is  also  cited  on  p.  35  of  Bratley,  Fox  and  Schrage  (1987)  and 
p.  279  of  Wilson  (1985).  (The  present  paper  is  an  extensive  revision  of  Glynn  and 
Whitt  (1986),  where  we  first  discussed  (1).) 

Efficiency  principle  (1)  can  be  considered  intuitively  reasonable.  However,  as  a 
byproduct  of  our  analysis,  we  will  see  that  in  several  different  estimation  settings  that  this 
criterion  is  not  appropriate  and,  in  fact,  leads  to  incorrect  conclusions.  Nevertheless,  in 
most  simulation  estimation  problems,  efficiency  principle  (1)  does  apply.  In  the  context  of 
such  problems,  our  paper  makes  several  contributions:  First,  we  describe  an  appropriate 
domain  of  applicability  for  (1).  Second,  we  give  a  precise  interpretation  to  the  terms 
“sampling  variance”  and  “amount  of  labour  expended”.  In  addition,  in  (1)  the  “amount 
of  labour  expended”  is  apparently  considered  deterministic.  Our  analysis  extends  the 
principle  to  the  setting  in  which  the  amount  of  labour  expended  is  itself  stochastic,  which 
is  typical  of  most  simulations. 

The  rest  of  this  paper  is  organized  as  follows.  In  Section  1  we  introduce  the  decision- 
theoretic  framework  for  estimation  with  a  budget  constraint.  In  Section  2  we  introduce  the 
concept  of  asymptotic  efficiency  of  an  estimator.  In  Section  3  we  present  a  random-time- 
change  limit  theorem  that  provides  the  basis  for  characterizing  the  asymptotic  efficiency  of 
an  estimator.  The  remaining  sections  are  primarily  devoted  to  examples  illustrating  how 
the  asymptotic  efficiency  framework  can  be  applied,  but  there  also  are  some  new 
asymptotic  efficiency  results  for  specific  estimators. 

In  Section  4  we  describe  the  canonical  case,  in  which  the  asymptotic  efficiency  is 
consistent  with  (1),  and  discuss  five  examples.  In  Sections  5  and  6  we  discuss  examples  in 
which  (1)  needs  to  be  modified,  because  there  is  a  non-canonical  estimator  convergence 
rate.  Section  5  focuses  on  subcanonical  estimator  convergence  rates,  while  Section  6 
focuses  on  supercanonical  estimator  convergence  rates.  The  examples  of  subcanonical 
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estimator  convergence  rate  discussed  in  Section  5  are  the  Kiefer-Wolfowitz  (1952) 
stochastic  approximation  algorithm,  for  which  we  draw  on  results  of  Ruppert  (1982);  a 
recursive  variant  of  a  derivative  estimator  discussed  by  Zazanis  and  Suri  (1986);  other 
recursive  estimators  related  to  the  replication  schemes  for  limiting  expectations  in  Fox  and 
Glynn  (1989b);  and  long-range  dependency  as  discussed  by  Cox  (1984).  Supercanonical 
estimator  convergence  rates  are  less  likely  to  occur;  the  one  example  in  Section  6  is  a 
Monte  Carlo  integration  rotation  estimator,  which  is  a  variant  of  a  rotation  estimator  in 
Fishman  and  Huang  (1983). 

In  Section  7  we  discuss  independent  replications  together  with  other  estimation 
procedures,  and  show  that  independent  replications  typically  cause  the  efficiency  to 
improve,  remain  unchanged  or  get  worse,  respectively,  when  the  estimator  convergence 
rate  is  subcanonical,  canonical  or  supercanonical.  Finally,  we  present  all  proofs  in 
Section  8. 

1.  Efficiency  with  a  Budget  Constraint 

Our  decision-theoretic  model  for  simulation  estimation  has  eight  elements: 

(i)  an  unknown  parameter  a, 

(ii)  a  loss  function  L(a),  a  real-valued  function  specifying  the  loss  associated  with 
estimating  a  by  a, 

(iii)  the  experiment,  a  stochastic  process  (Y,  C )  ■  {[F(r),  C(f)]:  r  ^  r0}»  to  21  0.  with 
Y(t)  representing  the  time-dependent  estimator  of  a  and  C  (r)  representing  the  cost 
of  obtaining  the  estimator  Y(t),  we  refer  to  Y  as  the  estimation  process, 

(iv)  a  budget  constraint  c, 

(v)  the  realized  length  of  the  experiment,  T(c)  *  sup{/  2  0  :C(t)  £  c}, 


-4- 


(vi)  the  budget-constrained  estimator  Y(T(c)), 

(vii)  the  risk  function  R(c)  =  EL(Y(T(c))), 

(viii)  the  efficiency  e(c)  =  l/R(c). 

Our  goal  is  to  estimate  the  parameter  a  in  (i).  We  assume  that  the  parameter  a  is  a 
real  number,  but  the  same  ideas  apply  more  generally. 

We  regard  estimation  as  a  special  case  of  decision  making  under  uncertainty,  so  we  use 
the  decision-theoretic  framework  advocated  by  Wald  (1950),  Savage  (1954)  and  others; 
e.g.,  see  Chapter  1  of  Ferguson  (1967).  Of  course,  the  loss  function  L  in  (ii)  is  actually  a 
function  of  a  as  well  as  a,  which  may  be  important  in  a  decision-theoretic  analysis  (e.g., 
in  a  Bayesian  analysis  using  a  prior  on  a),  but  we  do  not  emphasize  this  aspect.  We 
assume  that  L  is  nonnegative  with  I( a)  *  0.  The  classical  squared  error  loss  function 
arises  when  L(a)  *  (a  —  a)2,  but  we  do  not  restrict  attention  to  this  case. 

We  have  represented  costs  and  benefits  in  two  ways:  via  the  loss  function  L  and  the 
cost  process  C.  There  are  of  course  many  different  kinds  of  costs  and  benefits  that  might 
be  considered.  Many  of  these  can  easily  be  incorporated  in  L  or  C,  but  some  cannot,  e.g., 
the  cost  of  the  analyst’s  time;  see  p.  279  of  Wilson  (1985).  Also,  unexpected  benefits 
beyond  the  original  goals  are  often  realized  from  simulation  experiments.  However,  it  is 
not  our  purpose  to  try  to  examine  all  costs  and  benefits  in  detail.  We  believe  that  the 
relatively  simple  two-cost  framework  above  captures  essential  features  for  developing  a 
useful  efficiency  principle,  especially  for  evaluating  alternative  EITs. 

Basic  to  our  approach  is  the  formulation  of  key  features  of  the  experiment  as  a 
stochastic  process.  In  (iii)  we  have  represented  the  estimator  T(r)  and  the  cost  of 
generating  that  estimator  C(r)  as  jointly  distributed  stochastic  processes.  For  example,  Y 
might  be  a  sample  mean  process;  i.e.,  there  might  be  another  process  Z  such  that 
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Y(t)  =  r-1  fZ(s)ds  for  t  >  0.  We  typically  think  of  the  cost  C(r)  as  being  simply 
o 

computer  time,  but  there  could  be  other  cost  components  as  well.  We  assume  that  the 
sample  paths  of  C  are  nondecreasing  nonnegative  real-valued  functions  of  t  that  are 
unbounded  above.  The  randomness  is  appropriate  because  the  cost  associated  with  a  given 
portion  of  the  experiment  is  indeed  often  random.  For  example,  when  a  sequential 
stopping  procedure  is  used  to  terminate  a  simulation,  the  total  number  of  observations 
generated  will  be  random.  It  is  important  that  we  make  no  assumption  about  the  joint 
distribution  of  Y  and  C,  which  in  many  applications  will  be  quite  complicated. 

The  experiment  is  assumed  to  evolve  in  “time"  t,  where  t  is  some  natural  measure  of 
the  length  of  the  experiment.  We  assume  that  the  realized  length  of  the  experiment  is 
7(c)  in  (v),  the  “time”  when  the  budget  c  in  (iv)  is  exhausted.  The  final  budget- 
constrained  estimator  is  then  y(7(c))  in  (vi).  For  example,  in  a  regenerative  simulation,  t 
(for  t  integer)  might  represent  the  number  of  regenerative  cycles,  while  the  cost  C(r)  is  the 
random  effort  required  to  generate  those  cycles.  The  actual  estimator  Y(T(c))  then  is 
based  on  the  random  number  T(c)  of  cycles  achieved  under  the  computational  budget  c. 

We  define  the  efficiency  of  the  experiment  for  a  given  computational  budget  c  as  the 
reciprocal  of  the  risk  R(c)  in  (vii).  With  everything  else  held  fixed,  one  experiment  is 
said  to  be  more  efficient  than  another  if  its  efficiency  is  greater.  Of  course,  direct 
comparisons  of  this  sort  are  usually  difficult  to  make,  because  the  efficiency  is  usually 
difficult  to  calculate. 

2.  Asymptotic  Efficiency 

Our  goal  now  is  to  turn  the  philosophically  appealing  model  of  Section  1  into  a 
practical  basis  for  evaluating  estimators  by  considering  the  asymptotic  behavior  as  c  -  *. 
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The  resulting  notion  of  asymptotic  efficiency  is  thus  applicable  only  in  a  large  sample 
context,  but  large  samples  are  typical  of  most  simulation  experiments. 

A  concept  of  asymptotic  efficiency  emerges  naturally  from  the  notion  of  efficiency  in 
Section  1.  In  particular,  we  say  that  one  estimator  is  more  asymptotically  efficient  than 
another  if  it  is  more  efficient  for  all  sufficiently  large  c.  It  is  significant  that  when  we 
focus  on  asymptotic  efficiency,  the  analysis  typically  simplifies  greatly.  Then  only  the 
central-limit-theorem  behavior  of  the  estimation  process  Y  and  the  law-of-large-numbers 
behavior  of  the  cost  process  C  matter;  see  §3.  Moreover,  the  specific  loss  function  often 
ceases  to  matter. 

We  establish  conditions,  which  are  often  verifiable,  under  which 

lim  crR(c)  “v-1  (2) 

c  -  * 

for  positive  constants  r  and  v.  The  pair  (r,  v)  is  our  proposed  characterization  of 
asymptotic  efficiency.  We  call  r  the  asymptotic  efficiency  rate  and  v  the  asymptotic 
efficiency  value.  To  compare  two  experiments  with  asymptotic  efficiency  parameter  pairs 
(r^vj)  and  (r2,v2),  we  use  a  lexicographic  criterion.  We  say  estimator  1  is  more 
asymptotically  efficient  than  estimator  2  if  r2  >  r2  or  if  r j  =  r2  and  vx  >  v2.  If  rx  >  r2, 
then  we  say  that  estimator  1  has  a  more  asymptotically  efficient  rate.  If  rx  =  r2,  then  we 
say  that  the  asymptotic  relative  efficiency  (ARE)  of  estimator  1  compared  to  estimator  2  is 
v,/v2. 

Note  that  the  lexicographic  criterion  is  consistent  with  the  previous  definition,  i.e., 
estimator  1  is  more  asymptotically  efficient  than  estimator  2  with  the  lexicographic 
criterion  if  and  only  if  estimator  1  is  more  efficient  than  estimator  2  for  all  sufficiently 
large  c. 
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It  remains  to  show  how  the  basic  elements  (L,  Y,  C )  in  the  model  of  Section  1 
determine  asymptotic  efficiency  parameters  r  and  v  in  (2).  For  this,  we  exploit  a  random- 
time-change  limit  theorem. 

3.  The  Supporting  Limit  Theorem 

We  now  establish  a  limit  theorem  for  the  budget  constrained  estimation  process 
T(7(c))  that  provides  a  basis  for  characterizing  the  asymptotic  efficiency  of  the  experiment 
(K,  C).  At  first,  we  do  not  consider  the  loss  function  L. 

Our  key  assumption  for  the  estimation  process  Y  is  a  functional  central  limit  theorem 
(FCLT).  For  this  purpose,  let  D  ■  D((0,  *),  f?)  be  the  set  of  real-valued  functions  on  the 
open  interval  (0,  *)  that  are  right-continuous  with  left  limits,  endowed  with  the  standard 
Skorohod  J\  topology,  and  let  =>  denote  weak  convergence  (convergence  in 
distribution);  see  Billingsley  (1968),  Ethier  and  Kurtz  (1986)  and  Whitt  (1980).  (We  use 
the  open  interval  excluding  0  to  avoid  unimportant  problems  near  the  origin  in  estimators 

such  as  r~1foZ(s)ds.)  For  each  e  >  0,  let  *3/,  ■  >  0}  be  the  random  element  of 

D  defined  by 

%(t)  -  €->[T(r/€)-a]  ,  r  >  0  ,  (3) 

for  a  positive  constant  y.  (We  assume  that  Y  is  a  random  element  of  D.)  We  will  assume 
that  %  =>  <3/  in  D  as  e  -  0  for  some  limit  process  <3/,  and  write 

e-,[T(t/e)  -  a]  ■=>  ^Kt)  in  D  as  e  -  0  .  (4) 

For  practical  purposes,  the  FCLT  (4)  is  essentially  equivalent  to  the  ordinary  CLT  in  R 

obtained  by  focusing  on  a  single  t  in  (4),  say  r  ■»  1,  but  an  ordinary  CLT  is,  technically 

speaking,  slightly  weaker  than  a  FCLT.  (See  Example  1  of  Glynn  and  Whitt  (1988).)  An 

P  P 

easy  consequence  of  (4)  is  that  Y(r)  ~  a  as  t  where  -  denotes  convergence  in 
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probability.  It  is  significant  that  (4)  holds  in  great  generality,  so  that  this  assumption  is 
indeed  typically  satisfied.  In  most  cases,  7=1/2  (the  canonical  convergence  rate)  and 
<3/(0  is  or-1B(r),  where  B(t)  is  standard  (zero-drift  and  unit  variance)  Brownian  motion, 
so  that  <3/(1)  has  a  zero-mean  Gaussian  distribution  with  variance  o’2,  but  there  are  other 
possibilities. 

We  also  will  assume  limiting  behavior  for  the  cost  process  C,  but  it  is  significant  that 
we  need  to  know  much  less  about  C.  In  particular,  we  only  assume  a  simple  stochastic 
growth  condition  corresponding  to  an  ordinary  strong  law  of  large  numbers  (SLLN)  for  C; 
i.e.,  we  will  assume  that 

f~pC(r)-X“P  w.p.lasr-*,  (5) 

where  p  is  a  positive  constant.  Typically  $  =  1,  but  we  give  examples  in  which  0  #  1; 
see  Example  6.1. 

It  is  significant  that  we  do  not  directly  assume  anything  about  the  joint  behavior  of  Y 
and  C.  It  turns  out  that  we  are  able  to  establish  the  joint  limiting  behavior  for  Y  and  C, 
and  thus  the  limiting  behavior  for  the  fmal  budget-constrained  estimator  Y(T(c)),  from 
these  assumptions  alone.  Let  =  denote  equality  in  distribution. 

Theorem  1.  If  the  FCLT  (4)  holds  for  the  estimation  process  Y(t)  with  the  limit  process 
ty(t)  being  continuous  at  t  w.p.l  for  each  t  and  the  SLLN  (5)  holds  for  the  cost  process 
C(r),  then  a  FCLT  holds  for  the  budget-constrained  estimation  process  Y(T(c)),  i.e., 

cVP[y(r(cr)  -  a]  =►  ^XrI/p)  in  D  as  c  -  *  ,  (6) 

the  associated  CLT  holds,  i.e., 

c>/p[F(r(c))-a]  =►  <&(X)  =\-^l)  in  R  as  c  -  *  ,  (7) 


and  the  associated  WLLN  holds,  i.e., 
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Y(T(c))  £  a  as  c  -  *  .  (8) 

To  prove  Theorem  1,  we  first  relate  the  SLLN  for  C(r)  in  (5)  to  an  associated  SLLN 
for  7(c)  in  §l(v).  This  result  is  well  known  for  (3  =  1,  and  essentially  the  same  proof 
works  for  p  #  1 . 

Lemma  1.  Let  X  and  |3  be  strictly  positive  constants.  Then  r-pC(r)  -  X-p  w.p.l  as  r  -  * 
if  and  only  if  c“1/pT(c)  -  X  w.p.l  as  c  -  *. 

Next,  as  in  Theorem  4  of  Glynn  and  Whitt  (1988),  we  note  that  the  ordinary  SLLN  for 
7(c)  established  in  Lemma  1  is  actually  equivalent  to  a  FSLLN  (a  functional  version). 

Lemma  2.  If  c~1/p7(c)  -X  w.p.l,  then  c-1/p7(cr)  -Xr1/p  w.p.l  in  D([0,*),rt),  i.e., 
sup  {|c-1  g7(cr)  -  Xr1/p  |}  -  0  w.p.l  as  c  -  *  for  all  7. 

OmkT 

Finally,  we  apply  the  continuous  mapping  theorem  with  the  composition  map,  as  in 
Section  17  of  Billingsley  and  Section  3  of  Whitt  (1980);  see  Section  8  for  the  details. 

Remarks  (3.1)  For  our  applications,  we  only  use  the  CLT  (7),  but  the  FCLT  (6)  can  be 
useful  as  well.  The  FCLT  condition  (4)  is  needed  in  Theorem  1  even  to  get  the  CLT  (7); 
see  Example  4  of  Glynn  and  Whitt  (1988).  We  could  work  with  the  CLT  version  of  (4) 
instead  of  the  FCLT  if  we  added  extra  conditions,  such  as  independence  or  the  Anscombe 
(1952)  condition;  see  p.  15  of  Gut  (1988).  The  Anscombe  condition  is  closely  related  to 
the  tightness  associated  with  the  FCLT;  see  p.  55  of  Billingsley. 

(3.2)  Typically  the  limit  process  <3/(r)  in  (4)  is  <rt~1B(t),  where  B(t )  is  standard 
Brownian  motion,  which  has  continuous  sample  paths,  but  we  do  not  require  that  the 
sample  paths  of  39  be  continuous.  For  example,  B  could  be  replaced  by  a  stable  process, 
which  occurs  as  the  limit  for  normalized  partial  sums  of  i.i.d.  random  variables  when  the 
random  variables  have  infinite  variance.  For  applications,  see  Mandelbrot  (1963)  and 
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Fama  (1963). 

(3.3)  In  the  vast  majority  of  cases  <5J(1)  »  oN(0,l),  i.e.,  ^1)  has  a  centered  Gaussian 
distribution  with  variance  a2.  In  this  case,  the  practical  implication  of  the  CLT  (7)  is  that 

Y(T(c))  ~  a  +  ——^(0,1)  for  large  c  .  (9) 

If  all  candidate  experiments  satisfy  (9),  then  to  achieve  high  asymptotic  efficiency,  without 
considering  any  loss  functions,  it  is  natural  to  first  maximize  the  convergence  rate  y/$  and 
then  minimize  \-1a.  Indeed,  given  the  approximation  (9),  |f(7'(c))  —  a|, 

[T(T(c))  —  a]+  and  [F(r(c))  -  a]-,  where  x+  =  max{x,0}  and  x~  *=  —  min{x,0},  are  all 
minimized  in  the  stochastic  order  sense  with  this  criterion.  Indeed,  this  is  true  provided 
the  estimators  satisfy  ^,(1)  =  a{Z  for  a  fixed  random  variable  Z. 

(3.4)  In  (4)  and  (6)  we  consider  limits  as  e  -  0  and  c  -  This  is  not  different  in  any 
essential  way  from  considering  limits  involving  a  sequence,  e.g.,  c  **  1/n  for  n  integer;  see 
p.  16  of  Billingsley.  ■ 

We  now  consider  the  asymptotic  behavior  of  the  risk  associated  with  a  large  class  of 
loss  functions.  Unlike  Remark  3.3,  we  now  do  not  require  that  ^l)  =aN(0,l).  In 
order  to  get  convergence  of  moments  from  convergence  of  distribution,  we  assume 
uniform  integrability;  there  are  many  sufficient  conditions;  see  p.  32  of  Billingsley  and 
Sections  1.7,8  and  n.5  of  Gut  (1988).  In  practice,  we  would  rarely  worry  about  this 
technical  condition.  A  simple  sufficient  condition  is  for  the  loss  function  to  be  bounded. 

Corollary  1.  In  addition  to  the  assumptions  of  Theorem  1,  suppose  that  the  loss  function 
L  has  two  continuous  derivatives  with  L'(a)  *  0  and  L"(a)  >  0  and 
{ c2yl*L(Y(T(c))):c  >  1}  is  uniformly  integrable.  Then 
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lim  c2^/p£(c)  -  2_1L"(a)\-2Y£f<!y(l)2]  ,  (10) 

C-* 

so  that  the  asymptotic  efficiency  parameters  are 

r  —  and  v  —  — - r- — - t—  .  (11) 

£  L"( a)  k~ziE[% l)2] 

Remarks  (3.5)  If  L  is  twice  differentiable  and  a  is  a  strict  local  minimizer  of  L,  then  the 
conditions  L'(a)=0  and  I  "(a)  >  0  must  be  satisfied. 

(3.6)  Suppose  that  we  compare  two  estimators  satisfying  the  assumptions  of 
Corollary  1  with  common  p  and  7,  but  with  subcanonical  convergence  rate  y<  1/2  so  that 
r<  1.  Moreover,  suppose  that  estimator  l’s  asymptotic  mean  square  error  is  half  that  of 
estimator  2  (i.e.,  f^O)2  =  £<3/2(l)2/2),  while  the  cost  rate  is  twice  that  of  estimator  2 
(i.e.,  Xf 1  =  2X2  1),  then  the  asymptotic  efficiency  value  of  estimator  1  is  greater  than  that 
of  estimator  2  by  a  factor  of  21~2'*.  This,  of  course,  is  inconsistent  with  the  Hammersley- 
Handscomb  efficiency  criterion  (which  was  clearly  formulated  with  7=1/2  in  mind).  This 
analysis  indicates  that  the  variability  of  the  estimator  tends  to  be  more  important  than 
there  is  subcanonical  convergence.  Thus,  in  the  trade-off  between  variance  reduction  and 
(possible)  additional  computational  complexity,  variance  reduction  usually  is  top  priority 
when  7  <1/2.  ■ 

It  is  significant  that  the  form  of  the  loss  function  does  not  affect  asymptotic  efficiency 
under  the  assumptions  of  Corollary  1,  because  L  appears  in  r  and  v  only  through  the 
constant  multiple  L"(a)  in  v.  In  other  words,  if  two  candidate  estimators  satisfy  the 
assumptions  of  Corollary  1,  then  our  lexicographic  efficiency  criterion  provides  a  ranking 
that  is  independent  of  the  specific  form  of  the  loss  function.  Specifically,  estimator  1  is 
more  asymptotically  efficient  than  estimator  2  if  7j/Pi  >  72/fl2  or  if  71 /(Jj  =  72/£2  and 
X2  2<V1£[%(1)2]  > Xj” 2i,£1Yi(1)2];  note  that  the  criterion  is  independent  of  L.  Moreover, 


-  12- 


d 

note  that  (11)  is  consistent  with  (9)  when  <19(1)  =  ariV(0, 1).  Formulas  (9)  and  (11) 
complement  each  other,  because  (9)  applies  to  loss  functions  beyond  those  considered  in 
Corollary  1,  whereas  (11)  applies  to  non-Gaussian  distributions  (and  distributions  not  all 
related  by  a  scale  transformation) . 

Of  course,  the  loss  function  need  not  satisfy  the  conditions  of  Corollary  1,  but  other 
cases  can  be  treated  in  a  similar  way.  For  example,  here  is  another  natural  case. 

Corollary  2.  In  addition  to  the  conditions  of  Theorem  1,  suppose  that  1(a)  =  \a  —  a|p 
for  p  >  0.  If  {cpy/&  |r(r(c))  -  a|p  :c  &  1}  is  uniformly  integrable,  then 

lim  cpyl*R(c)  »  X'^£[|q9(l)|P]  ,  (12) 

so  that  the  asymptotic  efficiency  parameters  are 

r  *  ^  and  v  ®  ^  ■  ■  (13) 

In  general,  (13)  is  not  consistent  with  (11),  so  that  the  form  of  the  loss  function  can 
matter.  However,  when  3  is  fixed  and  <&(1)  =  aAT(O.l),  (13)  is  fully  consistent  with  (9) 
and  (11),  and£[|<3Xl)|']  =  cr^£ £ |AT(0, 1) |^]. 

Remark  (3.7)  In  general,  the  cost  process  C(r)  can  affect  the  asymptotic  efficiency  rate  r 
through  p,  but  in  the  canonical  case  of  ^  -  1 ,  the  asymptotic  efficiency  rate  is  determined 
solely  by  the  estimator  convergence  rate  y.  Hence,  to  achieve  maximum  asymptotic 
efficiency  when  p  =  1,  the  first  objective  is  to  maximize  the  estimator  convergence  rate  y. 
Then,  among  those  estimators  with  maximum  estimator  convergence  rate,  we  want  to 
maximize  the  asymptotic  efficiency  value.  ■ 

Obviously  the  cost  process  usually  grows  linearly,  so  that  p  «  1  in  (3).  However, 
other  cases  do  arise,  as  is  illustrated  here  in  Example  6.1.  The  following  variant  of  the 
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SLLN  for  martingale  differences  is  a  basis  for  establishing  the  required  nonlinear  SLLN 

L*J 

for  C(r)  when  C(r)  =  2  T/»  *  a  0,  where  |rj  is  the  greatest  integer  less  than  or  equal  to 

/-i 

r.  We  allow  the  variables  t<  to  be  dependent  as  well  as  non-identically  distributed,  but 
most  of  our  applications  are  under  the  extra  condition  of  independence.  (For  an 
exception,  see  Example  4.5.) 

Theorem  2.  Let  a  1}  be  a  sequence  of  real-valued  r.v.’s  on  an  underlying 

probability  space  (ft.SF.P)  and  let  {SF„  :n  a  1}  be  an  increasing  sequence  of  sub-a-fields  of 
SP  such  that  t„  is  measurable  with  respect  to  9„.  If 

n~bE(i„  |SFb_i)  -  a  w.p.l  as  n  -  * 


and 


n  </Var[T„-£(T„|9fB_i)] -c  as  n  -  * 
with  b  >  - 1  and  d  <  2b  +  1 ,  then 

n 

n~b~l  2  T<  -  a/(l  +  b)  w.p.l  aj  n  -  *  . 
/-l 


4.  The  Canonical  Case 


The  canonical  case  arises  when  the  conditions  of  Corollary  1  to  Theorem  1  hold  with 
y  -  1/2,  p  =  1  and  0(0  =  t~1<rB(0  for  each  t  where  B  is  standard  Brownian  motion,  so 
that  <3/(l)  =ctA/(0,1),  i.e.,  ty(l)  is  distributed  as  a  zero-mean  Gaussian  distribution  with 
variance  cr2.  From  (11),  the  asymptotic  efficiency  rate  is  then  r  =  1  and  all  interest 
centers  on  the  asymptotic  value,  which  is 


v 


K 

A-V 


for 


K  - 


2 

L"( a)  ’ 


(14) 


We  interpret  (14)  as  support  for  (1): 
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Asymptotic  Efficiency  Principle.  In  the  canonical  case,  the  asymptotic  efficiency  value  v  may 
be  taken  as  inversely  proportional  to  the  product  of  the  sampling  variance  rate  a2  and  the 
cost  rate  X-1. 

This  interpretation  for  the  cost  rate  X-1  is  clear  from  (5).  For  a2,  note  that  under  the 
conditions  of  Corollary  1  to  Theorem  1  (which  includes  uniform  integrability)  that 
t  Var  Y(t)  -  cr2  as  r  -  so  that  Vary(r)  =  <x2/f.  In  this  setting,  we  can  take  the 
asymptotic  value  v  as  being  inversely  proportional  to  the  product  of  the  sampling  variance 
Vary(r)  =  a2/r  +  o(r_1)  and  the  cost  C(r)  =*  X-1r  +  o(t). 

The  rest  of  this  paper  is  primarily  devoted  to  examples.  The  five  examples  in  this 
section  all  produce  the  canonical  case,  for  which  the  asymptotic  efficiency  principle  above 
and  (1)  are  appropriate. 

Example  4.1.  Independent  Replications.  Suppose  that  a  can  be  represented  as  a  *  EX  for 

some  random  variable  (r.v.)  X.  ( X  might  correspond  to  the  number  of  customers  served  in 

a  queue  during  the  time  interval  [a,  &].)  Then  a  can  be  estimated  by  the  sample  mean 
_  n 

X„  ~  n~l  2  *<»  where  are  i.i.d.  copies  of  X.  Then  the  estimation  process  here 

is  Y(t)  =  X i,j ,  t  2:  1  (where  again  |rj  is  the  greatest  integer  less  than  or  equal  to  r). 

Let  Ty  be  the  amount  of  computer  time  required  to  generate  Xt.  Disregarding  the 
usually  negligible  amount  of  computer  time  required  to  initialize  the  simulation  and 

compute  F(r)  from  the  Xf  s,  we  let  the  cost  process  be  C(t)  =  ^  t<.  It  seems  reasonable 

to  assume  that  the  Ty’s  are  positive  i.i.d.  r.v.’s.  For  most  applications,  Ty  will  indeed  be 
random.  For  example,  in  any  algorithm  in  which  acceptance/rejection  is  used  as  a  variate 
generation  technique,  will  be  random. 
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If  0  <  <t2  <  *,  where  a2  =  VarX,  then  the  FCLT  (4)  holds  for  Y(t)  with  *y  =  1/2 
with  <&(/)  =  t~lcrB(t)  with  B  being  standard  Brownian  motion,  by  Donsker’s  theorem, 
p.  137  of  Billingsley  (1968).  If  0  <  Etj  <  *,  then  the  SLLN  (5)  holds  for  C(f)  with 
3  =  1.  Hence,  the  conditions  of  Theorem  1  hold  for  the  canonical  case.  Moreover,  for 
this  example,  these  assumptions  also  imply  the  uniform  integrability  needed  for  Corollary 
1;  see  pp.  32,  54  of  Gut  (1988). 

Example  4.2.  Functions  of  Mean  Vectors  and  Regenerative  Simulation.  Let 
X  ■  [X(l),  .  .  .  ,X(d )]  be  an  Revalued  random  vector  with  p.  =  EX.  Suppose  that 
a  =  g(p.)  for  some  known  smooth  function  g:Rd  -R.  In  this  case  the  estimation  process 
is  Y(t)  =  g  (X  [,j ) ,  r  a  1,  where  X„  is  the  sample  mean  of  i.i.d.  random  vectors  distributed 
as  X.  This  estimation  process  arises  with  ratio  estimators ;  then  d  =  2,  g(xltx2)  =  *i/*2 
and  a  =  £X(1)/£X(2).  A  ratio  estimator  is  often  used  with  the  regenerative  simulation 
method  to  calculate  the  steady-state  mean  of  a  real-valued  regenerative  process  Z.  Then 
X(l)  is  the  integral  of  Z  over  a  regenerative  cycle  and  X(2)  is  the  duration  of  the  cycle. 

Let  the  cost  of  generating  the  Ith  cycle  be  t*  and  let  the  cost  process  C(r)  be  defined  as 
in  Example  4.1.  If  the  computational  time  to  generate  the  cycle  can  be  regarded  as 
approximately  equal  to  the  length  of  the  cycle,  then  t,  =  X(2),  which  is  of  course  typically 
random. 

The  following  theorem  establishes  the  FCLT  condition  in  Theorem  1,  without 
requiring  that  the  random  vectors  Xt  actually  be  i.i.d.  (We  apply  this  extension  in 
Example  4.4  below.) 

Theorem  3.  Suppose  that  *y  >  0,  p.  €  Rd  and 

«_7N»«(0  “  P-]  "*■  »K0  in  D  as  e  -  0  , 

where  i)/(  and  i|>  are  random  elements  of  D.  If  g:Rd  ~  Rl  is  continuously  differentiable  in 
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a  neighborhood  of  |X,  then 

«-1f[*OI»t(0)  -£(M-)]  =“>  inD  ase-0. 

Returning  to  our  example,  we  assume  that  £||X||2  <  *,  where  ||-||  is  the  Euclidean 
norm  in  Rd.  Then  the  multivariate  version  of  Donsker’s  theorem  implies  that 

(mj 

■>  rl/2£(0  inD  as  n  -  *  , 

/-i 

where  B  is  a  standard  Brownian  motion  in  Rd  (with  d  mutually  independent  1 -dimensional 
marginal  standard  Brownian  motions)  and  Y  is  the  covariance  matrix  of  X.  The  matrix 
T1/2  is  not  uniquely  specified  by  T,  but  may  be  taken  as  the  lower  triangular  matrix 
obtained  by  Cholesky  factorization;  see  p.  84  of  Feller  (1971)  and  p.  165  of  Bratley,  Fox 
and  Schrage.  Hence, 

1  1"<J  1 

e’1'2  TTT  2  xi-»  =►  TmB(t)/t  inD  ase-0. 

.  L,/eJ  i-i  ) 

Finally,  by  Theorem  3,  if  g  is  continuously  differentiable  in  the  neighborhood  of  p.,  then 
\  =>  in  D  where  <9#(r)  *  Vg(p.)r1/2fl(r)/r.  The  remaining  conditions  in  Theorem  1 
and  Corollary  1  hold  as  in  Example  4. 1 .  (For  the  uniform  integrability,  it  suffices  to  treat 
the  marginals  separately.)  Then  we  have  the  canonical  case,  i.e.,  the  limits  in  (7)  and  (8) 
with  y  =  1/2,  0-1,  X-1  =  £Tlt  <3*1)  =  crN( 0,1)  and 

<r2  “  Vg(p,)rVg(n)' .  (15) 

Example  4.3.  Steady-States  Means.  Suppose  that  a  is  the  steady-state  mean  of  a  real¬ 
valued  stochastic  process  X  -  {* (r)  :t  0},  and  suppose  that  we  intend  to  estimate  a  with 

the  sample  mean  process  Y(t)  *  X(/)  —  r— 1 JT  X(s)ds,  t  >  0.  Thus,  we  assume  a  FCLT 
for  the  cumulative  process  associated  with  X,  i.e.. 
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XW* -<«,/«] -><,8(0  in  D  u.-O  (16) 

which,  as  in  Example  4.2,  immediately  implies  \  as  e  -  0  where  ^r)  =  <x B(t)/t. 

A  variety  of  different  assumptions  on  the  structure  of  X  give  rise  to  such  a  FCLT.  For 
example,  there  are  FCLTs  of  the  form  (16)  when  X  is  stationary  and  satisfies  a  mixing 
condition  (e.g.,  Section  20  of  Billingsley),  when  X  is  regenerative  (e.g.,  Glynn  and  Whitt 
(1987))  and  when  X  is  a  martingale  (e.g..  Chapter  7  of  Ethier  and  Kurtz  (1986)).  The 
great  variety  of  very  robust  hypotheses  which  lead  to  FCLTs  of  the  form  (16)  lead  us  to 
view  (16)  as  a  very  general  assumption,  which  can  be  expected  to  hold  for  virtually  all 
“real  world”  steady-state  simulations. 

Remark  (4.1)  Suppose  that  X(t)  =£>  X(&)  as  t  It  is  important  to  note  that  it  is 
typically  not  the  case  that  c r2  =  VarX(*)  for  a  in  (16).  The  constant  a2  in  (16)  is  the 
time-average  variance  constant  of  X,  which  reflects  the  correlation  structure  of  X.  In 
particular,  if  X  is  a  uniformly  integrable  stationary  stochastic  process  having  an  integrable 
covariance  function,  then 

"2  -  <WCO).X(0*  ■  (17) 

Formulas  for  <r2  when  X  is  a  function  of  a  Markov  process  appear  in  Glynn  (1984), 
Whitt  (1989)  and  references  cited  there.  The  time-average  variance  constant  o2  is  difficult 
to  estimate.  Consequently,  much  attention  has  bem  devoted  in  the  simulation  literature  to 
its  estimation;  see  Section  3.3  of  Bratley,  Fox,  and  Schrage  (1987).  ■ 

Turning  to  the  process  C(f),  we  assume  a  SLLN  of  the  form  (5)  holds  with  (3  =  1.  As 
in  the  case  of  assumption  (16),  a  wide  variety  of  steady-state  simulations  possess  behavior 
that  is  characterized  by  such  a  SLLN.  For  example,  suppose  that  the  process  X  takes  the 
form  X(t)  =  f  (Z(r))  for  some  real-valued  function  /.  One  then  simulates  X  by  simulating 

Z.  It  seems  reasonable  to  assume  that  C(t)  »  JQh(Z(s))ds  for  some  nonnegative  real- 
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valued  h.  If  Z  is  a  positive  recurrent  regenerative  process,  then  (5)  is  known  to  hold  with 
P  =  1  under  suitable  moment  conditions.  Similarly,  if  Z(r)  is  stationary  and  ergodic,  then 
so  is  h(Z(t)),  so  that  we  obtain  (5)  with  |3  =  1  if  Eh(Z(t))  <  *.  As  with  (16),  we  view 
(5)  as  a  relatively  mild  regularity  hypothesis  on  the  steady-state  simulation.  In  this 
example  the  constant  X  can  be  viewed  as  the  rate  at  which  simulation  time  is  produced  as  a 
function  of  computation  time. 

Example  4.4.  Functions  of  Steady-State  Means.  As  a  generalization  of  Examples  4.2  and 
4.3,  suppose  that  a  —  g(n)  where  |x  is  the  steady-state  mean  of  an  Revalued  process  X, 
and  that  Y(t)  -  g(X(t ))  where  X(t)  is  the  sample  mean.  This  kind  of  estimator  arises  in 
calculating  the  steady-state  conditional  probability 

P(X(r)  €A|X(r)  €5)  =  P(X(r)  €  AB)/P(X(t)  £B)  and  in  estimating  the  steady-state 
variance.  (In  this  case,  g  is  again  the  ratio  functional  g(x!,x2)  =  xxlx2-)  We  can 
combine  a  d-dimensional  analog  of  (16)  with  Theorem  3  to  obtain  the  desired  FCLT 
%  =«►  as  e  -  0.  As  in  Example  4.2,  if  <3#(1)  *  Vg(p)r1/2B(r)/r,  then  a2  is  given  by 
(IS).  The  cost  can  be  treated  as  in  Example  4.3. 

Remark  (4.2)  As  in  Remark  4.1,  here  T  is  not  the  covariance  of  the  steady-state  variable 
X(x).  For  a  reasonably  behaved  stationary  process,  T  can  be  represented  as 

T  =  f*E(X(0)  -  n)'(X(i)  -  \i)ds  +  Jo*E(X(5)  -  ^)'(X(0)  -  n)ds  . 

As  in  the  scalar  case,  T  is  hard  to  estimate. 

Example  4.5.  The  Robbins-Monro  Stochastic  Approximation  Algorithm.  To  depart  from 
the  familiar  sample  mean  setting,  we  now  briefly  consider  the  Robbins-Monro  (1951) 
stochastic  approximation  algorithm  (denoted  by  RM)  which  is  finding  application  in 
simulation;  see  Wasan  (1969),  Kushner  and  Clark  (1978)  and  Glynn  (1986).  Our  goal  is 
to  find  the  parameter  a  *  0*  that  minimizes  a  smooth  function  P(8).  We  assume  that 
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there  exist  random  variables  Z( 0)  such  that  p'(0)  =  £Z(0).  To  calculate  0*,  we  use  the 
RM  algorithm  0„  +  i  ■  0„  -  cJCn+l,  n  z  0,  where  {c„ :  n  2:  0}  is  a  sequence  of 
deterministic  nonnegative  constants  and  X„+i  is  independently  generated,  conditional  on 
0„.  i.e., 

P(X„  +  i  €A|0o,Xo . 0„,X„)  =  F(Z(0„)  €  A)  . 

Then  the  estimation  process  is  defined  by  Y(t)  =  0[fj,  f  a  0.  We  assume  that  the  time 
t„  +  i  required  to  calculate  0n+i  from  0„  has  a  conditional  cdf 

P( t»+i  ^  t|TO,0o . Tn,0„)  =  F0ji(r) 

W 

for  some  family  of  cdfs  Fe(f)  indexed  by  0.  Then  C(r)  =  2,  T/> r  a  0- 

/-i 

Now  assume  that  c„  =  cln  for  c  >  0  and  that  p  is  continuously  differentiable  with 
cp'(0*)  >  1/2.  Kersting  (1977)  and  Ruppert  (1982)  have  shown  that  under  mild 
additional  regularity  assumptions  that  %  <3/  as  e  -  0,  as  needed  for  Theorem  1 ,  with 

y=l/2  and 

<9/(r)  =  at-W+VBit20*')  =  ,  (18) 

where  D  -  cP'(0*)  -  1,  a2  =  c2k2(2D  +  1)_1  and  k2  =  VarZ(0*).  Note  that  the  limit 
process  {<3/(r):f  a  0}  is  not  of  the  form  {crB(t)/t:t  a:  0},  but  we  still  have  the  canonical 
case,  because  <3/(r)  in  (18)  has  the  same  one-dimensional  marginal  distributions  as 
<rr-1fl(r)  for  all  t  (which  can  be  seen  by  calculating  the  variances).  Hence,  together  with 
(5),  (18)  implies  that  the  FCLT  (6)  holds. 

To  establish  (5)  with  P  =  1  we  can  apply  Theorem  2.  For  this  purpose,  let 

X'1(0)  =  f*tdF>(t)  and  o2(0)  =  f*[t  -  k~'m2dF9(t)  (19) 

If  sup{<r2(0) :  0  €  /?}  <  *  and  X"1  is  continuous  in  the  neighborhood  of  0*,  then  the 
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conditions  of  Theorem  2  hold  with  b  —  0  and  d  *  1 ,  because  of  the  well  known 
convergence  0„  -  0*  w.p.l.  Hence,  f-1C(r)  -  X-1  w.p.l  as  t  -  *. 


5.  Subcanonical  Estimator  Convergence  Rates 

In  this  section  we  consider  examples  in  which  FCLTs  hold  for  the  estimation  process, 
but  with  a  rate  y  <  1/2.  Hence,  the  cost  rate  X-1  appears  in  the  asymptotic  efficiency 
value  v  in  (11)  raised  to  the  power  2y  <  1,  so  that  principle  (1)  needs  to  be  modified  as 
indicated  in  Remark  3.6.  These  examples  with  subcanonical  convergence  rates  are  leading 
candidates  for  VRTs. 


Example  5.1.  The  Kiefer-Wolfowitz  Stochastic  Approximation  Algorithm.  Unlike  the  RM 
stochastic  approximation  algorithm  in  Example  4.5,  the  Kiefer-Wolfowitz  (1952)  stochastic 
approximation  algorithm  (denoted  by  KW)  yields  a  subcanonical  estimator  convergence 
rate.  The  subcanonical  convergence  rate  occurs  because  now  we  must  estimate  derivatives 
with  finite  differences.  As  before,  our  goal  is  to  find  a  parameter  a  *  0*  that  minimizes 
a  smooth  function  p(0).  Now  we  assume  that  (3(0)  can  be  represented  as  |3(0)  *  £Z(0). 
Successive  estimates  of  0*  are  0B+j  =  0„  -  c„X„ + 1  where  {c„:n  a:  0}  is  a  sequence  of 
deterministic  constants  and  X„+i  is  independently  generated  conditional  on  0„,  i.e.. 


P(Xn+l  €A|0o,Xo . 9n,Xn) 


P 


[  2hn+1 


where  Z(0„  + /t„+i)  and  Z(0„  -  hn+i)  are  independently  generated.  As  in  Example  4.5, 
the  estimation  process  is  Y(t )  *  0[,j,  r  2  0.  Suppose  that  the  constants  c„  and  h„  are 
chosen  to  be  of  the  form  cn  ■  cn-1  and  hn  ■  hn~ 1/3  for  c  >  0  and  h  >  0.  Assume  that 
£  is  three  times  continuously  differentiable  on  R  and  that  0*  is  the  unique  solution  of 
p'(0)  ■  0.  We  further  require  that  c  satisfy  cfJ"(0*)  >  1/3.  Then  Ruppert  (1982)  shows 
that,  under  mild  additional  regularity  conditions. 
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nm(Y(nt)-a)  =>  <rt~b B(tu  +  l)  in  D  as  n  -  *  ,  (20) 

where  b  —  c|3''(0*),  A  =  b  —  5/6,  <r2  =  c2k2/(2A  +  l)(4h2),  k2  =  2  Var  Z(0*),  and  B  is 
a  standard  Brownian  motion. 

The  cost  process  C(t)  can  be  treated  as  in  Example  4.5,  so  that  under  the  regularity 
conditions  there,  (5)  holds  with  fi  =  1.  Combining  (5),  (20)  and  Theorem  1  we  obtain 

cm(Y(T(c))  -  a)  =>  ffX~iB(X2A  +  1)  =  <rX-1/3/V(0,l)  .  (21) 

The  limit  in  (21)  is  a  centered  Gaussian  as  in  the  examples  of  Section  4,  but  the 
asymptotic  efficiency  rate  in  (11)  is  r  =  2/3  and  the  asymptotic  efficiency  value  v  is 
inversely  proportional  to  \“2/3o2.  The  non-canonical  estimator  convergence  rate  leads  to 
the  cost  rate  X-1  in  v  in  (11)  being  raised  to  the  power  2y  ¥=  1. 

Example  5.2.  Recursive  Derivative  Estimators.  Suppose  that  our  goal  is  to  estimate 

a  *  P'(0O)  where  f)(0)  is  a  smooth  function  of  0  which  can  be  represented  as 

fj(0)  *  EZ(0)  for  each  0  in  an  open  interval  about  0O.  We  can  estimate  a  via  the  sample 
—  n 

mean  Xn  =  n~l  2^**  where  the  Xk  are  independently  generated,  with  X*  being  the 

*-i 

random  forward  difference 

Xk  *  [Z*(0O  +  hk)  -  Z*(0o)]/A*  .  k  a  1  , 

and  Z*(0O  +  hk)  and  Z*(0O)  are  independently  generated.  The  resulting  estimation  process 
is  T(r)  =  X[,j,  t  a  0.  This  estimator  is  a  recursive  version  of  a  derivative  estimator 

ft 

studied  by  Zazanis  and  Suri  (1986).  Their  estimator  is  n  ~ 1  2  Xk,n  where 

Xk.n  ~  [Zk(9o  +  hn)  -  Zk(Qo)]lhn,  In  contrast  to  Zazanis  and  Suri's  estimator,  note  that  we 
can  easily  compute  our  X„+i  from  X„  by  setting 

XB+,  -  (nX* +*„+,)/(" +  D  , 
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but  our  estimator  is  harder  to  analyze  because  the  random  variables  X k  are  not  identically 
distributed.  Thus  this  example  is  not  a  special  case  of  Example  4.1. 

Suppose  that  t(0)  is  the  computer  time  required  to  calculate  Z(0)  and  <J>*  is  the  time 
required  to  compute  X k  from  Z*(0o  +  /»*)  and  Z*( 60).  Then  a  reasonable  approximation 
for  the  cost  process  might  be 

C(t)  =  ^  (t*(0o  +  hk)  +  t*(0o)  +  <(>*)  ,  t  2:  0  , 

*«i 

where  the  r.v.’s  t*(0o  +  hk)  and  t*(0o)  are  independently  generated.  As  in  Example  4.5, 
it  is  possible  to  impose  conditions  so  that  we  can  apply  Theorem  2  to  obtain  (5)  with 

P  -  1. 

In  order  to  apply  Theorem  1  to  characterize  the  asymptotic  efficiency,  we  establish  a 
FCLT  for  the  estimation  process  Y.  The  limit  process  is  of  particular  interest  because  it  is 
not  centered.  Hence,  the  approach  to  asymptotic  efficiency  in  Remark  3.3  is  not  possible. 

Theorem  4.  Suppose  that  Z(0)  =£►  Z(0<j)  as  0  -  0o  and  that  {Z(0)2 :0O  — t  s  0  s  0o  +  e} 
is  uniformly  integrable.  If  {3  is  twice  continuously  differentiable  in  (@o  —  e,  0O  +  e)  and 
hk  *  hk~ 1/4  with  h  >  0,  then 

e-i/4(y(r/€)  _a)  =>  +  _2L.in0  as  e  -  0  , 

t  t1 

where  *c2  =  4VarZ(0o)/3h2,  T|  =  2{3"(0o)h/3  and  B(t)  is  standard  Brownian  motion. 
Hence,  under  the  conditions  of  Theorem  4  and  (5), 

c1/4(y(r(c))-tt)  =»  .^X.3'2)  £k~u*N(t\,K2)  in  R  as  c  -  * 

K  X 


and,  under  the  extra  conditions  of  Corollary  1  to  Theorem  1, 
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limc1/2/?(c)  =  ~^-X_1/2(k2  +  ti2)  . 

c  -*  2 


(22) 


Now  the  asymptotic  efficiency  principle  in  Section  4  needs  to  be  modified  in  three 
ways:  First,  the  asymptotic  efficiency  rate  in  (11)  is  r  =  1/2  instead  of  r  —  1;  second,  the 
asymptotic  cost  rate  X-1  appears  in  the  value  v  in  (11)  in  the  form  X_1/2  instead  of  X-1; 
and,  third,  the  variance  has  to  be  replaced  by  the  second  moment. 


An  important  question  that  arises  in  this  setting  is  the  choice  of  the  constant  h  that 
determines  the  difference  increment  X*  =  hk~1'*  used  in  the  Jfc*  finite-difference 
approximation  Xk.  For  example,  in  the  setting  of  (22),  we  want  to  minimize  the  second 
moment  of  the  limiting  normal  distribution, 


,  .  ,  4VarZ(80)  .  4 p"(0o)2X2 

K+Tl-  - TTi - + - « - 


(23) 


By  differentiating,  we  see  that  the  value  of  h  that  minimizes  (23)  is 

.  =  3  VarZ(80) 

P”(6o)2 


(24) 


This  analysis  based  on  (22)  is  equivalent  to  using  a  squared  error  loss  function.  If, 
instead,  the  loss  function  were  1(a)  =  |a  —  a  |p  for  p  *  2,  then  we  would  want  to 
minimize  the  pth  absolute  moment  of  the  limiting  Gaussian  distribution,  which  typically 
leads  to  a  different  minimizing  value  h *.  Thus,  when  the  loss  function  does  not  satisfy  the 
conditions  of  Corollary  1  to  Theorem  1  and  the  limiting  distribution  <3/(  1)  is  not  centered 
Gaussian,  the  form  of  the  loss  function  can  affect  asymptotic  efficiency. 


Example  5.3.  More  General  Recursive  Estimators.  We  now  consider  a  generalization  of 
Example  5.2  that  includes  certain  replication  schemes  for  limiting  expectations  in  Fox  and 
Glynn  (1989b).  Suppose  that  the  parameter  a  can  be  represented  as  the  limit  of  EXn  for  a 
sequence  of  random  variables  {Xn:n  2  1}.  The  proposed  estimator  is  the  sample  mean  X„ 
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where  the  r.v.'s  X/  are  taken  to  be  independent  (but  typically  not  identically  distributed). 
Then  the  estimation  process  is  Y(t)  2  1. 


As  in  Example  5.2,  let  X*  “X^-EX*.  We  characterize  the  limiting  behavior  by 

*2 

relating  the  asymptotic  behavior  of  EXn  and  EXn  -  a  as  n  -  There  are  three  cases,  one 
of  which  involves  a  non-centered  Gaussian  limit,  as  in  Example  5.2. 

a  2  *2 

Theorem  5.  Suppose  that  {n27-^*  :n  a  1}  is  uniformly  integrable  and  n2'l~lEX„  -  a2  as 
n  -  *  for  0  <  y  ^  1/2.  Suppose  that  n^(EXn  -  a)  -  b  as  n  -  *,  0  <  <  1. 

(1)  If  tj  >  y,  then 


€~y(Y(t/i)  —  a) 


1 2— 2-y  J 


1/2 


g(f2~2^) 

t 


in  D  as  e  -  0  . 


(b)  If  -n  “  y,  then, 


t~y(Y(t/i)  —  a) 


B(t2~2y)  1  bt~y 
t  1-y 


in  D 


as  e  -  0  . 


(c)  If  t)  <  y,  then 

€~^(Y(t/t)  -  a)  -  in  D  as  e  -  0  . 

1-T1 


The  following  corollary  describes  the  combination  of  Theorem  5  with  a  SLLN  for  the 
cost  process.  Motivated  by  Fox  and  Glynn  (1989b),  we  allow  nonlinear  growth. 

Corollary.  In  addition  to  the  assumptions  of  Theorem  5,  suppose  that  the  cost  of 
generating  X„  is  t„  where  {t„}  is  an  independent  sequence  with  n~pErH  -  a  and 
n“P  Var  t„  -  d  where  p  >  -  1,  a  >  0,  f3  <  2p  +  1.  Then 

n  a 

n-p-i  y  t(  -  - -  w.p.l  as  n  -  *  . 

<-i  1  +  P 


(a)  If  ti  >  y,  then 
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cV0+P)(y(7(c))  -  «)  => 


(b)  If  T|  =  7,  then 


cyO+P)(y(7-(c))_a)  =$> 


■y/(l+p) 


7/(l  +  p) 


AT(0,1) 


N(0,1) 


in  R 


as  c  -  *  . 


+ 


1  +  pj 


ya+p) 


b 

1-7 


in  R  as  c  -  *  . 


(c)  If  ti  <  y,  then 


cv<1+P)(F(r(c))-a) 


kV(l  +  P) 


1  +  p 


b 

1  “TI 


in  R  as  c  -  *  . 


Typically  y  <  1/2,  so  that  y/(l  +  p)  <  1/2  and  the  convergence  rate  of  both  Y(t)  and 
T(7(c))  is  subcanonical.  However,  Theorem  5  and  its  Corollary  also  cover  the  canonical 
convergence  when  y  =  1/2  and  p  —  0. 

Although  not  stated  in  the  full  generality  of  the  results  in  Fox  and  Glynn  (1989b), 
because  the  results  there  permit  non-polynomial  growth  rates.  Theorem  5  and  its  Corollary 
provides  improvements  by  treating  recursively  defmed  estimators  and  allowing  the  Ti’s  to 
be  random. 

Example  5.4.  Long-Range  Dependence.  Subcanonical  estimator  convergence  rates  also  can 
arise  in  the  estimation  of  steady-state  means  as  in  Example  4.3  when  there  is  long-range 
dependence  in  the  underlying  stochastic  process  X.  Instead  of  (16),  a  FCLT  may  hold 
with  y  <  1/2.  For  examples  of  long-range  dependence,  see  Mandelbrot  (1977), 
Taqqu  (1982),  Cox  (1984),  Vervaat  (1985)  and  references  cited  there. 
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6.  Supercanonical  Estimator  Convergence  Rates 

In  this  section  we  consider  a  variant  of  a  rotation  estimator  proposed  by  Fishman  and 
Huang  (1983)  and  show  that  it  possesses  a  supercanonical  rate  of  convergence.  We  also 
explain  the  supercanonical  convergence  by  showing  the  connection  to  numerical  integration 
using  the  rectangular  rule,  as  discussed  on  p.  53  of  Davis  and  Rabinowitz  (1984). 

Example  6.1.  Monte  Carlo  Integration  with  Rotation.  Our  goal  is  to  estimate 

a  —  J  f  (x)dx.  Note  that  a  =  Ef  (U)  where  U  is  uniformly  distributed  on  [0,1]  and  that 
U  ©x  is  also  uniformly  distributed  on  [0,1]  for  any  x,  where  ©  denotes  addition  modulo 
one.  Hence,  we  obtain  an  unbiased  recursive  rotation  estimator  by  setting  Y(t)  =  y^j, 
where 


and 


Yn  = 


"(«  +  l) 


2**.  "2:1 


(25) 


X*  = 


*-l 

2/ 

7-0 


Uk 


k  2:  1  . 


(26) 


The  asymptotic  behavior  of  X„  alone  can  be  regarded  as  a  stochastic  analog  of  a  well 
known  theorem  for  Riemann  sum  (rectangular  rule)  approximation  of  Riemann  integrals; 
see  p.  53  of  Davis  and  Rabinowitz  (1984). 

Theorem  6.  Suppose  that  the  derivative  /'  of  /  exists,  is  bounded  and  is  Riemann 
integrable.  Then 

(Xn-na)  ->  (/(l)-/(0)))(t/-y)  in  R  as  n  ^  * 


and  {\Xn-  na\p :n  2  1}  is  uniformly  integrable  for  all  p  >  0  so  that 
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£(|X„  — nab  -  1/ ( 1)  “ / (0)  |p £ [ | £/  —  y  |p]  as  n  -  *  .  (27) 


Now  we  obtain  the  FCLT  for  the  estimator  Y.  Let  |*1  be  the  smallest  integer  greater 
than  x.  (For  x  integer,  |xj  =  x  and  \x]  =  x  +  1.) 


Theorem  7.  Under  the  conditions  of  Theorem  6  with  Xk  in  (26), 


[L«J  „ 


^'~2V3  '°^  5(0  “  D([0*3C)*  R)  as  n  -  *  * 


so  that 


€_3/2(F(r/e)  -  a) 


.ZIlLJIOii  MI  in  D((0,*),  R)  ase-0 


Let  t*  be  the  r.v.  representing  the  time  to  generate  Xk.  Since  k  function  evaluations 
are  required  to  generate  Xk,  it  is  reasonable  to  assume  that  Erk  *  ak  +  b  and 
Var  t*.  =  ck  +  d  for  nonnegative  constants  a,  b,  c  and  d.  (We  expect  c  and  d  to  be  small.) 
Then,  by  Theorem  2, 

n 

n~2  2  T*  -  a  w.p.  1  as  n  -  *  .  (28) 


Corollary.  With  (28)  and  the  FCLT  for  Y  in  Theorem  7, 


c3/4(F(7(cr))  -a)  =>  l/W  /(°)1  B{Vtia)  D 

VT  tla 


as  c  -  *  , 


;3/4(F(r(c))  -  a)  £3/4l/W  ~/(°M  ^(0, 1)  in  £ 

V3 


as  c  -  « 


and,  assuming  a  loss  function  as  in  Corollary  1  to  Theorem  1  plus  uniform  integrability. 


ita  «»*( o  -  02!  „ , . 


l2  r"/ 


c  -  * 
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7.  Independent  Replications  Together  with  Other  Estimators 

We  now  consider  independent  replications  together  with  other  estimators.  Let  the 
framework  in  Section  1  be  modified  by  having  the  experiment  in  (iii)  be  a  Jk-tuple 
{(T',  Crisis  k}  of  independent  stochastic  processes.  (The  superscript  is  an  index, 
not  a  power.)  Let  the  overall  cost  process  be  C(t)  =  Cl(t)  +...+  C*(r).  Suppose  that  we 
also  combine  the  observations  by  averaging  in  the  usual  way,  i.e.,  by  using  the  estimator 
y(r)  «  [y^o+.-.+  yty)]/*. 

We  are  thinking  of  the  number  k  of  replications  being  fixed  with  the  length  of  the 
experiment  being  indexed  by  t.  The  final  estimator  is  then 

k-'Y(T(c))  -  [y^rcc))  +... + y*(7-(c))]/*  (29) 

where  7(c)  is  defined  as  before. 

With  this  modified  framework,  it  is  of  interest  to  know  how  independent  replications 
affect  the  efficiency  of  an  experiment.  Hammersley  and  Handscomb  (1964),  p.  SI,  assert 
that  independent  replications  do  not  alter  the  efficiency.  We  show  that  this  is  the  case 
with  a  centered  Gaussian  limit  if  and  only  if  y/0  =*  1/2  in  (6). 

Theorem  8.  (a)  If  the  conditions  of  Theorem  1  hold  for  each  i,  then 

*-y(Y,(t)-a)  =>  fWW+.-.  +  tfW]  in  £>  as  e  -  0  . 

If,  in  addition,  (S)  holds  for  each  i,  then 

r_pC(f)-*X~*  w.p.t  as  r  -  *  , 


so  that 

c-i/py(c)  _  w.p.l  as  c  -  *  , 


cyP[y(7-(cr))_ot] -►  jk-i[<jyi(jfe-i/PXr,/P)+...+  (^(Jr1/pXr,/p)]  in  D  as  c-* 
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and 

c^/p[y(r(c))  — a]  =*■  (<8/,(l)+...  +  <l!/*(l))  in/?  as  c-*. 

(b)  If,  in  addition,  <3/(l)  =  V(p.,o-2),  then  ^(1)  +  ...+  <3/*(l)  =  N(k\i.,k<r2)  and 

U-i.) 

c^[y(7(c))-a]  =>  2jX-7  (fjL  +  ATCO.l))  in  R  as  c  -  *  . 

At  least  when  ^  =  0,  Theorem  8(b)  implies  that  independent  replications  cause  the 
efficiency  to  get  better,  remain  unchanged  or  get  worse,  respectively,  when  y/0  <  1/2, 
y/£  =  1/2  or  y/p  >  1/2. 

8.  Proofs 

Proof  of  Lemma  1.  Suppose  that  r-pC(r)  -  X_p  w.p.l  as  r  -  *>.  Then  T(c)  -  *  w.p.l  as 
c  -  Since 

C(T(c) -l)scs  C(T(c))  for  all  c  ,  (30) 

we  can  divide  through  by  T(c)p  in  (30)  and  let  c  -*  to  get  c"!r(c)p  as  c 
Finally  take  13th  roots.  Starting  with  c-1/Br(c)  -  X  w.p.l  as  c  -  use 

r(C(f)-l)SfS  r(C(0)  for  all  t 

and  reason  similarly. 

Proof  of  Lemma  2.  We  actually  establish  a  stronger  result. 

Lemma  3.  Let  X(t)  be  a  random  element  of  D([0, *),/?).  If  f-pX(r)  -  X  w.p.l  as  r  -  * 
for  some  |3  >  0,  then  for  each  T  >  0 

sup  {|ePX(f/«)  -  Xrp  |}  -  0  w.p.l  as  c  -  0  . 
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Proof.  For  arbitrary  8  >  0,  choose  r0  so  that  \t~*X(t)  -  X  |  <  8  for  ail  t  >  to,  which  we 
can  do  by  the  assumed  w.p.l  convergence.  Then  considering  the  supremum  separately 
over  [0,ero]  and  [er0,  T],  we  obtain 


sup  {|€pX(f/e)  -  Xrp  |}  ^  €p  sup  {|X(r)|}  +  X(er0)p 

Osisr  Osisij 

+  sup  {rP|Ii£^_x|} 
«0sfsrl  1  (r/€)P  1 

=sep  sup  {|X(r)|}  +  X(€r0)p  +  rp8 
0  s  t  s  t0 


Since  X  is  a  random  element  of  D,  sup  {|X(r)|}  <  *.  Finally,  let  e  -0  and  then  let 

0srsi0 

8  -  0  to  obtain  the  desired  result. 


Proof  of  Theorem  1.  By  Lemmas  1  and  2,  the  SLLN  (S)  for  C(r)  implies  a  FSLLN  for 
T(c).  By  (4)  and  Theorem  4.4  of  Billingsley  (1968), 

(c^/p[y(c1/pr)  -  a]  ,  c~u*T(ct))  =>  (<3/(r),  Xr1/p)  (31) 

in  D((0, *),/?)  x D([0, *),/?)  as  c  -  Now  apply  the  continuous  mapping  theorem 
(Theorem  3.1  of  Billingsley)  with  the  composition  map,  which  is  continuous  because  Xr1/p 
is  continuous  and  strictly  increasing;  see  Theorem  3.1  of  Whitt  (1980).  We  must  also 
make  sure  that  the  range  of  c_1/pr(cr)  is  contained  in  the  domain  of  cyl* [T(c1/pr)  -  a]. 
Since  we  are  establishing  convergence  in  D((0, *),£),  it  suffices  to  establish  convergence 
in  D([rt, *),1?)  for  all  tx  >  0;  see  Section  2  of  Whitt  (1980).  For  any  given  fi,  choose  r0 
such  that  Xr}/p  >  r0,  so  that  c~1/pT(cr)  >  r0  for  t  a  fj  w.p.l  for  all  sufficiently  large  c. 
Since  the  limit  process  <3/  is  continuous  at  t  w.p.l  for  each  r,  the  convergence  in  (31)  also 
holds  on  D([r0,*)  xD([fj,*)).  Then  replace  c~v*T(ct)  by  max{ro,cl/pr(cf)}  on  [rj,*) 
and  apply  Theorem  4.1  of  Billingsley  to  show  that  the  composition  argument  above 
remains  valid  on  Z>(fr0, *),/?)  *),!?).  This  yields  the  desired  weak  convergence 

(4)  in  D([fi, *),/?).  Since  t\  was  arbitrary,  we  obtain  convergence  in  £)((0, «),/?).  To 
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establish  (7)  from  (6),  use  the  continuous  mapping  theorem  with  the  projection  map, 
p.  121  of  Billingsley.  To  see  that  <&(X)  =  X-7<&(1),  note  that  in  (3) 

and  let  e  -  0.  ■ 

Proof  of  Corollary  1,  Using  Taylor’s  theorem,  we  expand  £,[T(r(c))]  about  a  to  get 

L[Y(T(c))]  =  2~l  L"  (£C)[Y  (T  (c))  —  a]2  , 

where  falls  between  Y(T(c))  and  a.  Note  that  (je  -  a  as  c  -  *  by  virtue  of  (8)  and 
L"(£c)  ~  L"(a)  by  the  assumed  continuity  of  L" .  Hence  c2y/&L(Y(T(c)))  ==>  <3/(Xr1/p)2 
in  D  as  c  -  *.  Finally,  (10)  holds  by  virtue  of  the  uniform  integrability. 

Proof  of  Corollary  2.  Since  L[y(T(c))]  =  |y(T(c))-a|'  , 

CPyl*L[Y(nc))]  =>  \~Py  |<3/(1)  \p  in  R  as  c  -  * 

from  Theorem  1  and  the  continuous  mapping  theorem.  Hence,  (12)  follows  from  the 
assumed  uniform  integrability.  ■ 

Proof  of  Theorem  2.  We  show  that 

n 

n~b~l  2  [Tf-,£(T/|^/_i)]  -Ow.p.l  as  n 

<-i 

and 

n~b~ 1  2  -  a/(l  +  b)  w.p.l  as  n  -  *  . 

/-l 

The  first  limit  follows  from  the  SLLN  for  martingale  differences,  p.  243  of  Feller  (1971), 
because  under  the  stated  conditions 

£  Var  [«rB -£(t„|^-,)]  <  *  . 

n-1 


The  second  limit  follows  from  the  following  lemma. 
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Lemma  4.  If  {a*  a  1}  is  a  sequence  of  real  numbers  such  that  k-ka*  -  a  as  k  -  *  for 
b  >  —  1 ,  then 

limn-1-*  £  a*  =  c/(l  +  b)  . 

Proof.  Suppose  that  a  >  0.  (Similar  arguments  apply  for  a  —  0  and  a  <  0.)  Fix  an 
arbitrary  €  >  0.  Let  k0  be  such  that  a(l  -  t)kb  £  a*  s  a(  1  +  i)kb  for  all  k  7st  k0.  Then, 
for  all  n  s  k0, 

—n~l~b  2  (lay  |  +  ajb)  +  a(l  -  On’*-*  £  / 

7-i  v  7-1 

=s  n-'~b  2  ay  £  n~l~b  f  |ay |  +  a(l  +  e)*"1-*  2  Jb  • 

7-1  7-1  7-1 

a  n 

Since  n~1~b  2  jb  “  «_1  2  O'/n)6  is  *  Riemann  sum  approximation  to  the  integral 
7-1  7-1 

/  xb dx  -  (1  +  b)'1 ,  we  can  let  n  -  *  and  then  e  -  0  to  obtain  the  desired  result. 

’'O 

Proof  of  Theorem  3.  We  use  the  assumed  weak  convergence  and  the  Skorohod 
representation  theorem,  see  Whitt  (1980),  to  construct  versions  such  that 
€_7['l'*(^t(0)  ~  M>]  -  M»(0  and  MO  - t  uniformly  over  the  interval  [a,  b]  w.p.l  where  \e 
are  the  homeomorphisms  of  (0,  *)  associated  with  the  Skorohod  J\  topology.  Then  we 
use  the  continuous  differentiability  of  g  to  expand  gOMMO))  as 

a-*!* (»MM0))  -g(*)l  -  V*(v«(0)c-*Ni«(M0)  -  Hi  (32) 

where  vc(r)  is  on  the  line  segment  joining  t|»c(M0)  and  \i.,  a  £  r  £  b.  This  implies  the 
desired  conclusion. 

Proof  of  Theorem  4.  This  is  a  special  case  of  Theorem  3(b)  with  >  ■  n  ■  1/4. 
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Proof  of  Theorem  5.  We  can  express  t(Y{t!i)  -  a),  except  for  a  factor  of  (f/e)/  (f/cj ,  as 


l»/«J  lf/*J  A  lr/*J 

e  2  xi  -  at  "  «  2  Xi  +  6  2  (EXt  -  “)  +  («l'/«J  -  0«  .  (33) 

/-i  /-i  /-i 


To  treat  the  first  term  in  (33),  we  apply  the  martingale  FCLT  on  p.  340  (part  b)  of  Ethier 
and  Kurtz.  For  this  purpose,  note  that 


,  L"«J  » 

Mt(t)  *  e1-7  2  Xt  ,  t  >  0  , 

/-i 


is  a  martingale  with  quadratic  variation  process 


,  ,  l»'«J  *2 

A«(r)  =  e2'2^  2  EX,  ,  (20. 

(-1 

By  Lemma  4,  since  n2y~lEX*  -  cr2  for  2y  —  1  >  —  1, 

<.2,2-2* 

A«(0  -  as  6  -  0  • 

By  Lemma  3,  this  convergence  is  uniform  on  bounded  intervals.  Furthermore,  for  any 

8  >  0, 


but 


A ,(T)  *  £[  supT  |Af€(r)  ~  Af #(r — ) |2]  =  e2"2^£[  max  X,] 

0  s»sr  lS<S77e 

s  e2_2>  2 

/-I 


s  8+  2  e2'27£[X?;  X?  >  8e27-2] 

/-i 

[r/ij  A  2  a  2 

s8  +  e  2  (*/r)27_1£[X*;  X*  >  h(k/T)l-2y*~l]  , 

*-i 

since  {/i27_1X*}  is  uniformly  integrable,  k2y~1E [X*;  X*  >  8(jfe/r),_2’y€"1]  -  0 


uniformly  in  k  as  e  -  0.  Since  8  was  arbitrary,  we  conclude  that  A ,(T)  -  0  as  c  -  0. 


Hence, 


A/,(r)  [o,2/(2-2y)]l/2£(r2-2'v)  in  D  as  c  -  0  . 


(34) 
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Turning  to  the  second  term  in  (33),  we  apply  Lemma  4  to  deduce  that 

I*/*]  L 

(e/O1'”  2  (EX,  ~  a)  -  -T—  • 

/-i  1_in 

By  Lemma  3, 

i  l^*J  hr1-,, 

e1-”  Y  (EX, -a)  -  - -  inD  ase-0  (35) 

/-i  i-n 

Finally,  we  have  the  three  results  (a),  (b)  and  (c)  by  combining  (34)  and  (35). 

Proof  of  the  Corollary  to  Theorem  5.  The  SLLN  for  C(r)  =  n_p_1  ^  r,  follows  from 

/-i 

Theorem  2  with  £  =  1  +  p  and  X-p  =  a/(l  +  p).  The  rest  follows  from  Theorems  1  and 
5. 

Proof  of  Theorem  6.  If  /(0)  */(l),  then  /(x©c)  is  continuous  for  all  c  and 
( X„  -  na)  -  0  w.p.l  by  (2.1.10)  on  p.  53  of  Davis  and  Rabinowitz  (1984).  (The  argument 
there  remains  valid  if  /  is  differentiable  everywhere  except  at  one  point  where  left  and 
right  derivatives  exist.)  Hence,  add  a  linear  function  to  / to  make  /( 0)  =  /( 1)  and  apply 
Lemma  5  below  to  the  linear  function.  (Of  course,  both  the  integral  and  the 
approximating  sum  are  additive  for  the  two  functions.)  For  the  uniform  integrability,  note 
that  the  bounds  in  (2.1.8)  on  p.  53  of  Davis  and  Rabinowitz  apply  uniformly  to  the 
translation  point  U. 

Lemma  5.  Suppose  that  /  (x)  =  ax,  Osrs  1.  Then 


Proof.  Note  that 
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k mQ  ft  j  ft  2  ft  2 


Proof  of  Theorem  8.  The  FCLT  for  Y  holds  by  Theorems  3.2  and  5.1  of  Billingsley.  The 
rest  is  elementary. 
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